Optimizing postprandial glucose prediction through integration of diet and exercise: Leveraging transfer learning with imbalanced patient data

Background In recent years, numerous methods have been introduced to predict glucose levels using machine-learning techniques on patients’ daily behavioral and continuous glucose data. Nevertheless, a definitive consensus remains elusive regarding modeling the combined effects of diet and exercise for optimal glucose prediction. A notable challenge is the propensity for observational patient datasets from uncontrolled environments to overfit due to skewed feature distributions of target behaviors; for instance, diabetic patients seldom engage in high-intensity exercise post-meal. Methods In this study, we introduce a unique application of Bayesian transfer learning for postprandial glucose prediction using randomized controlled trial (RCT) data. The data comprises a time series of three key variables: continuous glucose levels, exercise expenditure, and carbohydrate intake. For building the optimal model to predict postprandial glucose levels we initially gathered balanced training data from RCTs on healthy participants by randomizing behavioral conditions. Subsequently, we pretrained the model’s parameter distribution using RCT data from the healthy cohort. This pretrained distribution was then adjusted, transferred, and utilized to determine the model parameters for each patient. Results The efficacy of the proposed method was appraised using data from 68 gestational diabetes mellitus (GDM) patients in uncontrolled settings. The evaluation underscored the enhanced performance attained through our method. Furthermore, when modeling the joint impact of diet and exercise, the synergetic model proved more precise than its additive counterpart. Conclusion An innovative application of the transfer-learning utilizing randomized controlled trial data can improve the challenging modeling task of postprandial glucose prediction for GDM patients, integrating both dietary and exercise behaviors. For more accurate prediction, future research should focus on incorporating the long-term effects of exercise and other glycemic-related factors such as stress, sleep.


Introduction
The global incidence of diabetes is on the rise, accompanied by escalating severity.This progression entails detrimental ramifications including compromised quality of life (QoL), multifarious complications, and costly surgical treatment.Projections indicate a staggering $2.5 trillion global expenditure on diabetes-related medical costs by 2030 [1].In light of this, there is an imperative to curtail these expenses while ameliorating QoL by proactively mitigating diabetes severity.
Recent national clinical guidelines [2] underscore a fundamental tenet of preventive intervention: the maintenance of blood glucose levels within the normative spectrum.A pivotal approach to achieving this control involves adopting a balanced lifestyle encompassing dietary measures, physical activity, and insulin therapy.Technological strides in continuous glucose monitoring (CGM) apparatus have empowered patients in managing glucose levels via mobile applications in the comfort of their homes, facilitating self-care [3,4].However, mastering optimal behavioral adjustments for maintaining normoglycemia remains a challenge for patients [5].Hence, a personalized framework is imperative, one that tailors recommendations for optimal individual behaviors, thereby ensuring the trajectory of future blood glucose levels aligns with the norm.This ambition necessitates the precise anticipation of how behavioral modifications will influence forthcoming blood glucose dynamics.
To date, various data-driven techniques for predicting glucose, incorporating behavioral factors, have emerged [6,7].Prior literature primarily focused on dietary and exercise facets, employing time-series machine learning models like the autoregressive model [8] and long short-term memory (LSTM) [9] for glucose forecasting.Independent positive impacts on predictive accuracy have been demonstrated for dietary and exercise factors [7].Yet, the synergy between these factors in achieving optimal prediction remains understudied.Recent clinical investigations [10][11][12][13][14] have illuminated that strategic synchronization of diet and exercise, such as moderate postprandial exercise, holds potential for further glucose reduction across diabetes profiles.However, the translation of these discoveries into predictive glucose modeling has remained uncharted.
When considering the amalgamation of multiple behaviors, a critical hurdle is sidestepping overfitting in learning from an imbalanced patient dataset collected in unconstrained settings.Taking the instance of diet and exercise integration, the frequency of intermediate-level postmeal exercise tends to be notably lower than that of minimal or no post-meal exercise among gestational diabetes mellitus (GDM) patients leading their daily lives [15].Consequently, an accurate estimation of the combined impact of postprandial exercise and diet becomes challenging due to sparse data on high-intensity postprandial exercise.
In recent years, an innovative solution has emerged to tackle data imbalance concerns by harnessing extensive patient data through transfer learning techniques [9,[16][17][18].This strategy thrives when the feature distributions of other patient data exhibit a range of values.Yet, when feature distribution across all patients is markedly imbalanced, the risk of extracting and transferring inaccurate insights escalates.
This paper proposes a novel application of the transfer learning for glucose prediction (Fig 1), which harmonizes the distribution of behavioral features by synergizing with supplementary intervention data from a randomized controlled trial (RCT).This harmony is instrumental in the predictive modeling of postprandial glucose involving dietary and exercise variables.The method commences with an RCT, where behavioral conditions are randomized for a healthy cohort, amassing data with a balanced distribution.Subsequently, Bayesian parameter learning is executed on the prediction model utilizing the RCT data, yielding a dependable parameter distribution.Ultimately, this pre-trained distribution is judiciously rescaled and employed as a prior for each patient's parameter learning utilizing observational data from real-world scenarios.This ensures a robust knowledge transfer from the RCT domain, curtailing overfitting risks inherent in imbalanced patient data.Empirical validation underscores the efficacy of this method, as evidenced by enhanced prediction performance in postprandial glucose prognosis using an authentic GDM patient dataset.

A. Integrative glucose prediction with diet and physical activity
Numerous machine learning techniques incorporate dietary and exercise factors to anticipate blood glucose levels.Jankovic et al. [8] and Xie [19] introduced an autoregressive model, wherein carbohydrate intake influences and muscular energy expenditure-driven exercise effects independently impact glucose levels.Likewise, support vector regression [20] and a physiological model [21] have been advanced for glucose prediction by integrating time-varying dietary and exercise influences, as computed through ordinary differential equations.
However, these methodologies entail training predictive models using individual patient historical data collected in unconstrained settings.Yet, these uncontrolled data settings introduce model misspecifications.This stems from the inherent imbalance in dietary or exercise feature distribution, attributed to each patient's distinct and established lifestyle.Consequently, limitations in data volume per patient compound the issue.

B. Glucose prediction with transfer learning
The primary hurdle in blood glucose level prediction lies in the scarcity of both the quality and quantity of patient data necessary for robust model training.Particularly, sophisticated deep learning techniques demand substantial training data volumes.In recent times, various strategies employing transfer learning to address this challenge have emerged.Transfer learning endeavors to construct an apt model for a target domain (referred to as a target task) by extrapolating model knowledge acquired from other domain datasets (termed source tasks) [22,23].
For instance, Faruqui et al. [9] suggested an approach entailing initial model learning using population data from a patient group as a source task, followed by transferring the model for individual patient-specific learning.Other studies have filtered a subset of population data based on its resemblance to a target patient, employing it as training data for either a source task [9] or a target task [17].Furthermore, to facilitate knowledge sharing among patients, a multitasking learning strategy [24] was proposed, addressing individual model learning for all patients in parallel.Additionally, for harnessing data from diverse patient groups, adversarial transfer learning [16] was proposed, which pre-aligns feature presentations between patient groups.
Diverging from these approaches, the introduced method (Fig 1 ) takes a distinct stance.Primarily, while prevailing methods seek to augment individual data volume by integrating other patient data, our approach focuses on enhancing observational data quality through leveraging data from randomized controlled trials (RCTs).Secondly, our method stands apart by proactively intervening to procure high-quality data, with experimental conditions randomized based on the target model structure intended for learning.

Methods
In this section, we elucidate the problem's context, expound on the modeling of dietary and exercise impacts for postprandial glucose prediction, and subsequently detail the implementation of transfer learning utilizing the RCT dataset.

A. Problem setting
We aimed to develop a predictive model that integrates the intertwined influences of both diet and exercise.Consequently, our study concentrated on forecasting postprandial glucose levels within the context of concurrent dietary and exercise effects on blood glucose.The anticipation of postprandial glucose levels assumes paramount significance in furnishing optimal behavioral suggestions, given the consistent post-meal surge in glucose levels, prone to deviations from the norm [25,26].
Consider now postprandial glucose levels with regard to the patient's diet.When the start time of the diet is τ*, the target postprandial glucose level is represented as time series y t * þ1: t * þT of the target patient.Because the glucose level within 1 h after a diet is of clinical importance, we set T = 90 min.In addition, it is well known that carbohydrate intake (CI) increases glucose, energy expenditure (EE) from exercise lowers glucose immediately, and the features of CI [27] and EE [8] perform well for glucose prediction.Suppose, we have three types of observation variables from the patient: (i) the glucose level history before the diet y H 1:t * , (ii) the CI sequence x 1:M from the diet and the corresponding intake timing sequence τ x 1:M , and (iii) the EE sequence z 1:N from an exercise around a diet and the corresponding exercise timing sequence τ z 1:N .N and M denote the number of carbohydrate intakes and exercises, respectively, within one diet.Accordingly, as illustrated in Fig 2, we aim to solve a time-series regression problem to predict the target variable y t * þ1:t * þT from observable variables y H 1:t * ; x 1:M ; z 1:N ; τ x 1:M ; τ z 1:N .In the following part, these observable variables are represented without the subscript such as y; y H ; x; z; τ x ; τ z respectively.

B. Bayesian predictive model for postprandial glucose
Our model is based on an interpretable Bayesian regression model, as shown in Fig 3 .This preference is rooted in the model's inherent transparency and traceability in contrast to complex machine learning constructs like LSTM.This transparency holds paramount significance in ensuring effective quality control for the model's real-world applications.
Following this approach, our model is based on a cutting-edge Bayesian model for glucose prediction [28], wherein forthcoming blood glucose levels are prognosticated as a summation of the time-series response under a treatment-like carbohydrate intake-and a baseline glucose level, with Gaussian noise introduced.Our study delves into two variants: an additive model and a synergistic model, both designed to account for combined effects.In the former, dietary and exercise responses are discretely generated and linearly aggregated, aligning with preceding research [8,19].Conversely, the latter model embraces interdependency between dietary and exercise responses in a synergistic manner.This is substantiated by contemporary medical insights that highlight how the impact of postprandial exercise on glucose reduction is contingent upon carbohydrate intake levels [29] and the elevation in postprandial blood glucose [30].Our supposition in the synergistic model postulates that such interactive impacts manifest through the multiplication of dietary and exercise effects.Furthermore, for performance benchmarking, we also explore a solitary-effect model relying solely on dietary effects, as previously addressed [28].
Specifically, the single-effect, additive-effect, and synergetic-effect models are represented by the following equations: where y; R d , and R e are time series, and y base represents the baseline blood glucose level, excluding the effects of diet and exercise.Since the time range for focusing on postprandial blood glucose is quite short, we assume that y base is constant and substitute the median value of the history of preprandial blood glucose y H from 15 min before the meal.R d indicates the dietary effect of increasing glucose levels, and R e indicates the exercise effect of decreasing glucose levels.e represents the level of noise and we assume this follows N(0,σ).Furthermore, R d and R e are represented as time-series responses to each CI or EE treatment.These responses appear after the initiation of a dietary or exercise event, and these effects diminish over time.We assume that this process, wherein the effects increase and then decrease over time, can be represented by the bell-shaped function, following the literature [28].Additionally, considering the real-world scenarios where patients occasionally have successive meals within 90 min and often perform postprandial exercise multiple times, R d and R e are modeled as the summation of responses for multiple treatments in the same way as below.
where N and M denote the numbers of meals and exercise sessions, respectively.Here, we adopt a bell-shaped function as the response function, following [28], because of its interpretability and smaller number of parameters.In this function, τ x,i and τ z,j represent the times when CI and EE start to occur, respectively.In addition, the functions are amplified by the treatment dose, that is, the amount of CI (x i ) of i-th intake or the amount of EE (z j ) of j-th exercise, for each.Each of these parameters is person-specific.We postulate that individuals within the healthy group demonstrate similar dietary and exercise effects, and the same principle applies to the patient group.Accordingly, we introduce a hierarchical prior distribution of person-specific parameters, enabling stable parameter learning by sharing parameter knowledge across individuals within each group.We assume this hierarchical prior follows Gaussian centered on Θ ¼ ðã d ; bd ; ãe ; be Þ for an additive model and Θ ¼ ðã d ; bd ; ãe ; be ; CÞ for a synergetic model, given that we have no specific knowledge regarding the prior.These hyperparameters are common to each person in the same group (healthy or patient group) and are learned for each group, as shown in Fig 3.

C. Bayesian transfer learning with prior rescaling from RCT data
Free-environment patient data are imbalanced in the distribution of the amount of each treatment (e.g., moderate-intensity exercise after a diet is significantly infrequent), which causes overfitting of the above hyperparameter set Θ. To address this challenge, in our proposed method, a hyperparameter set Θ of patient group domain is learned through transfer learning with RCT data actively collected from healthy group domain.
Initially, we direct our attention towards enlisting healthy individuals as participants for the randomized controlled trial (RCT).This choice stems from their comparative acceptability to be intervened owing to fewer underlying health issues.To achieve balanced dose distributions for each treatment, data collection is structured to randomize treatment conditions systematically.Following this, leveraging the distributional insights garnered from the acquired RCT data, we facilitate effective learning within the patient group domain by applying and adapting the knowledge embedded in the learned hyperparameter set.
In addition, our target parameters of knowledge transfer are limited to only the exerciserelated hyperparameter set Θe ¼ ðã e ; be ; CÞ from a set of Θ, because the principle of the dietary effect to increase glucose differs between the diabetic group and the healthy group due to the significant difference in insulin sensitivity.In our RCT, we control only for the amount of exercise after the diets for each participant in the healthy group.The experimental procedure is described in the next section.
Based on the above premise, in the context of transfer learning, our source task is to learn the exercise-related hyperparameter set ΘS e in the healthy group domain with interventional data under RCT, and our target task is to learn the hyperparameter set ΘT e in the diabetic group domain with observational data under free environments.
In this study, we adopt a comprehensive framework for prior rescaling as introduced by Xuan et al. [22] in the context of Bayesian Transfer Learning (BTL) [22,23].This framework entails learning the probability distribution of parameters within the source task, subsequently rescaling this learned distribution, and employing it as an informative prior within the target task, as depicted in Fig 5(b).Pertaining to this rescaling process, a technique [31] was put forth, specifically addressing the scaling of variance parameters using pre-estimated coefficients for the target task, while preserving the mean parameter.Notably, adapting this framework to our specific challenges presents intricacies.To elucidate, the influence of exercise on glucose dynamics within the healthy cohort might diverge from that within the diabetic group.Evidently, differences in the efficacy of glucose uptake in leg muscle tissues between healthy and diabetic groups emerge [32].Consequently, this underscores the need for a mean parameter shift operation for our cross-domain prior to rescaling.
In this context, we propose extending the general framework to robustly shift the mean of the pretrained distribution of the parameter set Θe based on clinical domain knowledge in prior rescaling, as shown in Fig 5(c).We aim to realize the distribution shift by introducing a new adjustment parameter η to modulate the mean of the parameter be representing the strength of the exercise effect on the glucose trajectory.For this, η = 0.5 is suitable because the experimental results in the Ref. [32] show that the glucose uptake efficiency of the leg muscle in the diabetic group was about half that of the healthy group.η for the other exercise effect parameters ãe and C in Θe are assumed to 1. Furthermore, another adjustment parameter λ is introduced to robustly stabilize this distribution shift in the actual training process, and this parameter λ is used to reduce the variance of each parameter.Adding this control prevents the overfitting triggered by imbalanced exercise data in the target domain.To manage the uncertainty in setting η and λ, we introduce hyperpriors for these parameters.We then use Hierarchical Bayesian estimation with the patient dataset to determine their optimal values.Based on this, prior rescaling for the target parameter set ΘT e in the target task is represented by the following equation: The parameter set Θe follow Gaussian distribution with a diagonal matrix S where a variance for each parameter element is independent of each other.And we assume η follow Gaussian hyperprior where mean parameters correspond to 0.5 for be and 1 for others.Also, we assume λ follow Gaussian hyperprior with setting mean parameter to 0.1 respectively.The learning process for practical parameter learning is performed in two steps.The first step is we pre-train Gaussian parameters μ S ; Σ S for the distribution of ΘS e with RCT data in the source task.The second step is we rescale this estimand μS ; ΣS as in Eq 7, and then we perform learning all hyperparameter sets ΘT e ; ΘT d with observational patient data, in conjunction with learning individual parameters Θ T e ; Θ T d for each diabetes patient.These parameter learnings are performed by executing a Markov Chain Monte Carlo (MCMC) simulation with the No U-Turn Sampler implemented in RStan [33].The details of the simulation and the model assumptions are described in the 'Experimental setup E' subsection.

Experimental setup
This paper addresses two pivotal research questions. 1.How should the utilization of an RCT dataset for transfer learning be structured to acquire a predictive glucose model from an imbalanced patient dataset?
2. How can the fusion of diet and exercise be effectively modeled to optimize the prognostication of postprandial glucose levels?
To answer these questions, we built multiple patterns of predictive models with multiple patterns of transfer learning and compared their performance based on dedicated metrics using a real-world clinical dataset of GDM patients.

A. Evaluation policy
First, for question (i), we compared the performance of each predictive model built with and without normal or extended transfer learning described in 'Methods B' subsection.Second, for question (ii), we evaluated and compared the performance of the single effect model [28], the additive model, and the synergetic model described in 'Methods A' as a predictive model.Finally, we compared the performance of the following seven models: In cases without transfer learning, we substituted a non-informative prior for the hyperparameter set ΘT e .Notably, for any model pattern, knowledge of exercise-related parameters is still shared among all patients in learning through the hierarchical prior ('Methods A' subsection).Furthermore, note the additive model has limited exercise-related parameters Θe ¼ ðã e ; be Þ which are a subset of the parameters of the synergetic model (See Eqs 2 and 5).

B. Clinical data
The clinical data used for our performance evaluation were from a real-world free-environmental dataset of 72 patients with GDM, including continuous glucose levels, physical activity levels, and dietary records.This dataset, sourced from real-world environments, emerged from a clinical trial orchestrated by the authors [5].This trial was granted ethical sanction by the Ethics Committee of Helsinki and the Uusimaa University Hospital District.The recruitment effort targeted patients (aged 18 to 45 years) with GDM within the gestational window of 24-28 weeks, sourced from maternity clinics in the Helsinki metropolitan area between March 10 in 2021 and December 12 in 2022.The exclusion criteria for this recruitment were as follows: type 1 or type 2 diabetes, physical disability, use of medication that influences glucose metabolism, multiple pregnancy, current substance abuse, severe psychiatric disorder, significant difficulty in cooperating [5].Written informed consent was obtained from all patients, and from both parents on behalf of the infant.Data acquisition occurred across 3-day intervals in monthly sessions leading up to childbirth.It's important to note that this analysis represents a secondary examination of the eMoM GDM study [5].
Throughout each session, a continuous glucose monitoring (CGM) system (Guardian Connect System, Medtronic Ltd.) facilitated 5-minute interval glucose tracking for every patient, illustrated in Fig 6 .Concurrently, data related to physical activity were collected through a wrist-worn activity tracker (Vivosmart3, Garmin International Ltd.).Energy expenditure during exercise was automatically computed via the tracker.Dietary information encompassed nutrient quantities, including carbohydrate intake (CI), ingested at each temporal juncture, sourced from patients' manual food logs via a food tracking application developed by Helsinki University Hospital.Nutritional data integrity was fortified through nutritionist validation calls.Further information on the experimental protocol can be found in [5].

C. Preprocessing
Given our focus on postprandial blood glucose as the prediction target, we partitioned the continuous glucose data and accompanying variables around each mealtime, as depicted in Fig 6 .Each segment encompassed a time span of 15 minutes preceding a meal, extending to 90 minutes post-meal.Segments with absent continuous glucose data were omitted from analysis, leading to the exclusion of 4 patient datasets.As a result, 1619 segment of data were obtained from 68 patients.Subsequently, the segment data in the first two days of each session were used as the training data D T train , and the segment data in the last day were used as the test data D T test .The predictive model was trained in each session and its evaluation was performed within that session.This is because the segment data of the next month's session are heterogeneous from that of the current month, as the condition of a pregnant woman changes drastically after one month, even for the same individual [34].

D. Randomized controlled trial
The RCT data for our source task were collected from additionally recruited 4 healthy subjects (Aalto University students) in a six-days session with the same data collection as that in the clinical trial.The recruitment period was from February 1 in 2023 to April 30 in 2023.Written informed consent was obtained from all the participants for data collection and utilization.The purpose of the RCT was to obtain a dataset in which the distribution of EE during postprandial exercise enables robust learning of the exercise-related parameter set ΘS e .Therefore, the postprandial exercise conditions during data collection were randomized for each participant.In practice, the subjects were instructed to follow different conditions, as follows (See Fig 1).

• Low-intensity condition (Day 1 & Day 4)
• Eat almost same amount of carbohydrate at lunch (or dinner) • Don't perform an exercise until two hours after eating.

• Moderate-intensity condition (Day 2 & Day 5)
• Eat almost same amount of carbohydrate at lunch (or dinner) • 30 minutes after starting eating, walk continuously at a pace of 100 step/min for 20 minutes.
• After walking, don't perform an exercise until two hours after eating.
• High-intensity condition (Day 3 & Day 6) • Eat almost same amount of carbohydrate at lunch (or dinner) • 30 minutes after starting eating, walk continuously at a pace of 130 step/min for 20 minutes.
• After walking, don't perform an exercise until two hours after eating.
The above exercise conditions were designed based on clinical findings of the effect of exercise on glucose.Specifically, according to the literature [35], the appropriate timing of exercise for decreasing blood glucose was reported to be 30 min after the start of meals, when the exercise content was continuous walking for 20 min.
While the exercise condition differed between the days, the dietary condition was controlled to be the same for all days for each subject, as shown in Fig 1.This is because by equalizing the dietary effect on postprandial glucose among all days in a subject, the learning of targeted exercise parameter sets ΘS e can be facilitated more in our source task.Additionally, the participants were asked to choose breakfast or lunch as the target diet for each day.
The RCT and data preprocessing resulted in 24 segments of data D S train .Fig 7 shows the actual distribution of the EE in the interventional RCT and observational GDM datasets.

E. Parameter learning and prediction
The posterior of the overall parameter sets involved in each model among the seven patterns was estimated by MCMC simulation using both the observational training data D T train of the GDM group and the interventional RCT data D S train of the healthy group.Subsequently, we obtained the point estimation values ΘT e ; ΘT d which are a set of a posteriori medians for each parameter for each patient.Then, the predicted future postprandial glucose trajectory ŷt * þ1:t * þT was obtained by applying the model embedded with ΘT e ; ΘT d to the observed variable values y H ; x; z; τ x ; τ z in the test data D T test of the GDM group.Here, we delineate the assumptions underpinning our Bayesian models.All models outlined in the 'Experimental setup A' subsection incorporate person-specific parameters, α d and β d , of the dietary response function (Eq 4).The prior distributions for these parameters were set as a d � Nðã d ; 10Þ and b d � Nð bd ; 0:1Þ, respectively.The value of the variance parameter was determined based on the outcomes of learning the parameter distribution using an identical model in the literature [28].Additionally, the priors of hyperparameters ãd and bd adhered to a uniform distribution.Conversely, we lack prior knowledge of our original parameters α e , β e , and C in the exercise response function (Eqs 3 and 5).As a result, we employed entirely uninformative distributions for α e , β e , C, and their hyperparameter sets Θ e , μ e , S e in both the source and target tasks (Fig 5).For instance, the prior distribution of α e was set as Nðã e ; s a e Þ, and s a e was intended to follow a uniform distribution.
In order to diagnose the convergence of parameter learning with MCMC, we employed the Gelman-Rubin diagnostic [36].We executed multiple MCMC chains, compared the variance both within and between these chains, and computed the Gelman-Rubin statistic ( R) as the convergence indicator, for all the parameters.The estimation was performed by setting the number of MCMC chains to four, the number of sampling iterations to 4000, and the burn-in period to 2000 iterations.

F. Metrics
The most important evaluation criterion is the extent to which the predicted glucose series ŷ is coincident with the actual observation series y.Therefore, we evaluated (1) the root mean squared error (RMSE) of the predicted series and (2) the mean absolute error (MAE).Additionally, we evaluated the degree of coincidence of (3) the area under the curve (AUC) and (4) the postprandial maximum value of the glucose series because these are well-known glycemic indicators for diabetes research [37].
(1) RMSE ¼ 1 (2) MAE ¼ 1 After calculating each metric following the above equations for all segment data included in the test data D T test , the average of the metric values among all segments was used as the final metric score.

Result
Table 1 presents the conclusive average metric scores for each model across segments both with and without postprandial exercise.Within this context, an exercise segment is delineated by an energy expenditure (EE) exceeding 60 kcal.Focusing initially on segments involving postprandial exercise, we observe that the synergetic model, trained through extended transfer learning with RCT data, achieves the highest performance.Additionally, the augmentation of performance is evident across both the additive and synergetic models due to extended transfer learning, affirming its efficacy.Notably, this enhancement is particularly pronounced in the synergetic model, attributed to its more intricate model structure.Conversely, across segments devoid of postprandial exercise, the metrics remain largely consistent across all models.This consistency aligns with expectations, given that the exercise effect (R e ) within the additive or synergetic model approximates zero in the absence of postprandial exercise.Consequently, the forecasted glucose trajectory converges with that of the single model (Eqs 1-3).
The variations in the projected glucose trajectory by the synergetic model for each training scheme are depicted in Fig 8 .Instances of (a) no transfer and (b) regular transfer reveal discrepancies between the projected value (dark pink line) and the actual value (black dot).This discord stems from an overestimation of the exercise effect, which hampers the glucose response induced by diet (light pink line) subsequent to exercise events (green bar).In contrast, Fig 8(c) showcases the efficacy of extended transfer learning, encompassing a distributional shift from the RCT dataset.This approach ensures an appropriately calibrated exercise effect in terms of intensity and timing, consequently yielding ameliorated prediction errors.
Furthermore, Fig 9 illustrates instances of projected trajectories by the single, additive, and synergetic models.Notably, the glucose surge projected by the single model (a) exhibits a pronounced delay compared to the actual rise.This delay is likely attributed to overfitting of dietary parameters due to the omission of the exercise effect.Contrastingly, the combined models (b) and (c) aptly replicate the glucose elevation, demonstrating their success in addressing this aspect.These results demonstrate the effectiveness of our application methodology of transfer learning framework with RCT data and the synergistic modelling of dietary and exercise effects on glucose.

Discussion
An intrinsic strength of the proposed methodology lies in the transparency and visibility of the trained models and their parameters, as depicted in Fig 3 .This transparency facilitates seamless incorporation into the formulation of personalized behavioral recommendations for individual patients.Moreover, the integration of transfer learning with randomized controlled trial (RCT) data notably amplifies this capability by refining model parameters with heightened precision.For example, if the absolute value of exercise parameter β e in Eq 5 is estimated to be small in some patients, the exercise effect R e is not likely to appear easily, implying that recommendations regarding postprandial exercise should be of moderate (or higher) intensity.An additional advantage offered by the proposed transfer learning technique is its intrinsic ability to automatically establish a fitting prior distribution.Practical Bayesian modeling often entails substantial reliance on domain-specific knowledge for setting informative priors [36].Nevertheless, acquiring such requisite knowledge for training intricate time-series models is notably challenging, given the nascent state of corresponding medical insights in numerous cases [38].In response to this challenge, our approach empowers the creation of prior distributions with minimal domain knowledge, achieved through leveraging RCT data acquired via active intervention.Furthermore, due to the inherent simplicity of our devised method (Fig 1), its applicability extends to glucose prediction for diverse patient cohorts and other complex prognostication tasks within the healthcare domain.
However, an intriguing and contentious issue revolves around determining the requisite volume of data for a dedicated RCT within our framework.Guided by the Law of Large Numbers, the greater the volume of RCT data amassed, the more balanced the target distribution becomes, consequently bolstering the robustness of parameter learning.Nevertheless, amassing a substantial quantity of high-quality RCT data necessitates significant time and resources, given the demand for large-scale experimental endeavors.
Therefore, in practice, the amount of RCT data should be determined flexibly depending on the results of the convergence diagnosis in parameter learning.Here, the values of convergence indicator R [36] for each parameter in ΘS ¼ ðã d ; bd ; ãe ; be ; CÞ were (1.0001, 1.0040, 1.0017, 1.0008, 1.0013) for each, in our source task.Because R < 1:1 is conventionally considered that parameter learning converges, and the number of RCT could be sufficient to specify the posterior distribution of the parameters.

Limitation
This study focused primarily on the immediate impact of exercise on postprandial glucose levels, specifically the reduction caused by muscular fatigue.However, it is acknowledged that exercise has a longer-term impact on insulin sensitivity and glucose regulation through repeated physical activity [39].Future research will incorporate these enduring effects into the glucose prediction model to improve accuracy and provide more comprehensive behavioral recommendations.
Moreover, the model currently does not account for other factors that can influence glucose levels, such as stress, sleep patterns, time zone differences, dietary history, and medical conditions [7,26,40].Including these variables in future iterations of the model would enhance its predictive power and provide a more holistic understanding of glucose dynamics.The adaptability of the proposed method to other prediction tasks for different patient groups, such as those with type 1 or type 2 diabetes, will also be explored in future research.
Furthermore, the Bayesian model in the proposed method assumed Gaussian distributions for many of the parameter distributions.While this assumption led to the improved predictive performance as in the 'Result' section, it may not always hold the best assumption.Future research will explore and evaluate different types of probability distributions, along with diagnostics such as the posterior predictive checking [36], to optimize the selection of distributions and potentially further enhance predictive performance.

Conclusion
We present an innovative application of the transfer-learning framework utilizing RCT data for postprandial glucose prediction, integrating both dietary and exercise behaviors.The effectiveness of the proposed method was assessed using real-world data collected from 68 patients with GDM in their everyday settings.The evaluation conclusively demonstrates performance enhancement in postprandial glucose prediction through the implementation of our proposed approach with transfer learning.Our findings also underscore the superior accuracy of the synergetic model compared to the additive model in modeling combined factors.In practical terms, the applicability of our method is contingent upon the following conditions: (i) Timeseries data, which includes continuous glucose levels, exercise behaviors (inclusive of energy expenditure), and dietary behaviors (inclusive of carbohydrate intake), must be continuously collected from both a patient group and a control group.(ii) Within the control group, behavioral conditions, such as the timing and intensity of exercise, must be regulated for the subjects.Indeed, the potential of the transfer learning for such time series data in clinical outcome prediction has been recently proven in various medical fields [41], including diabetes glucose prediction [16][17][18]24].Moving forward, our research aims to incorporate the prolonged glycemic impact of exercise routines to forge a superior predictive model for tailored recommendations.

Fig 3 .
Fig 3. Graphical model of postprandial glucose.Parameter sets of both healthy group and patient group are estimated separately with this same model.https://doi.org/10.1371/journal.pone.0298506.g003 β d and β e are parameters representing the strength of the above amplification for each treatment dose, while α d and α e are parameters representing the response speed to each treatment.Examples of R d and R e are shown in Fig 4. Furthermore, to enable synergistic effect model in Eq 3, we introduce the adjustment parameter C for weighting the synergetic effect represented by R d � R e , where � means the element-wise product.

Fig 5 .
Fig 5. Extension of transfer learning framework.In each transfer method, only relationships between variables represented as bold line are used for parameter learning.https://doi.org/10.1371/journal.pone.0298506.g005

•
M base : Single-effect model • M add : Additive model without transfer learning • M syn : Synergistic model without transfer learning • M addþtrans : additive model with normal transfer learning • M synþtrans : Synergistic model with normal transfer learning • M addþtrans ext : additive model with extended transfer learning • M synþtrans ext : Synergistic model with extended transfer learning

Fig 7 (
a) reveals a pronounced imbalance in the feature distribution of postprandial exercises among 68 GDM patients observed over 18 days.Fig 7(b) confirms lesser imbalance in the distribution from the RCT dataset compared to that from the GDM dataset.

Fig 8 .Fig 9 .
Fig 8. Difference in predicted glucose trajectory with each training pattern.https://doi.org/10.1371/journal.pone.0298506.g008 Here, Fig 10 shows actual examples of the posterior of the parameter β e estimated for each healthy participant on the top and each patient on the bottom.As the example in Fig 8 belongs to patient P1 in Fig 10, we can confirm the overestimation of β e from Fig 10(a) and 10(b) at the bottom.In the context of practical recommendation scenarios, this tendency results in excessively optimistic and potentially detrimental suggestions, assuming a minor exercise could significantly enhance the patient's glucose profile.Yet, as depicted in Fig 10(c), this misalignment is rectified through extended transfer learning, facilitated by prior rescaling via the incorporation of shifting and shrinking operations outlined in Eq 7.This underscores the efficacy of the proposed transfer learning technique, enabling the formulation of exercise recommendations that genuinely optimize postprandial glucose reduction for individual patients, thanks to the precise acquisition of parameters.